x86 CLMUL CRC rewrite #127

Larhzu · 2024-06-11T20:15:23Z

It's faster with both tiny and large buffers and doesn't require disabling any sanitizers.

With large buffers the extra speed is from folding chunks in parallel. For large buffers, faster versions are out there but it's diminishing returns in context of XZ Utils. So let's skip wider folding or AVX2 code at least for now.

This reverts commit 9f1a6d6.

It's not enough to silence the address sanitizer. Also memory and thread sanitizers would need to be silenced. They, at least currently, aren't smart enough to see that the extra bytes are discarded from the xmm registers by later instructions. Valgrind is smarter, possibly because this kind of code isn't weird to write in assembly. Agner Fog's optimizing_assembly.pdf even mentions this idea of doing an aligned read and then discarding the extra bytes. The sanitizers don't instrument assembly code but Valgrind checks all code. It's better to change the implementation to avoid the sanitization attributes which also look scary in the code. (Somehow they can look more scary than __asm__ which is implictly unsanitized.) See also: #112 #122

It was already commented out.

It's a standalone program that prints the required constants. It's won't be a part of the normal build of the package.

Now it refers to crc_clmul_consts_gen.c. vfold8 was renamed to mu_p and the p no longer has the lowest bit set (it makes no difference as the output bits it affects are ignored).

By using modulus scaled constants, the final reduction can be simplified.

This way it's clearer that two things cannot be selected at the same time.

It's faster with both tiny and large buffers and doesn't require disabling any sanitizers. With large buffers the extra speed is from folding four 16-byte chunks in parallel. The 32-bit x86 with MSVC reportedly still needs a workaround. Now the simpler "__asm mov ebx, ebx" trick is enough but it needs to be in lzma_crc64() instead of crc64_arch_optimized(). Thanks to Iouri Kharon for testing and the fix. Thanks to Ilya Kurdyukov for testing the speed with aligned and unaligned buffers on a few x86 processors and on E2K v6. Thanks to Sam James for general feedback. Fixes: #112 Fixes: #122

On E2K the function compiles only due to compiler emulation but the function is never used. It's cleaner to omit the function when it's not needed even though it's a "static inline" function. Thanks to Ilya Kurdyukov.

Larhzu self-assigned this Jun 11, 2024

This was referenced Jun 11, 2024

tsan also needs sanitizer nerf for crc64 #122

Closed

no_sanitize_address isn't required #112

Closed

Larhzu added 8 commits June 16, 2024 12:56

Revert "Build: Temporarily disable CRC CLMUL to silence OSS Fuzz"

ead4d15

This reverts commit 9f1a6d6.

liblzma: Remove CRC_USE_GENERIC_FOR_SMALL_INPUTS

71b147a

It was already commented out.

liblzma: Add crc_clmul_consts_gen.c

9f5fc17

It's a standalone program that prints the required constants. It's won't be a part of the normal build of the package.

liblzma: CRC64 CLMUL: Refactor the constants

ef652ac

Now it refers to crc_clmul_consts_gen.c. vfold8 was renamed to mu_p and the p no longer has the lowest bit set (it makes no difference as the output bits it affects are ignored).

liblzma: CRC32 CLMUL: Refactor the constants and simplify

d8fb098

By using modulus scaled constants, the final reduction can be simplified.

liblzma: Use a single macro to select CLMUL CRC to build

20014c2

This way it's clearer that two things cannot be selected at the same time.

sysdefs.h: Add alignas

c0e7eaa

Larhzu force-pushed the crc_clmul_rewrite branch from 8931e99 to 888cef9 Compare June 16, 2024 10:30

Larhzu added 2 commits June 17, 2024 15:00

Larhzu force-pushed the crc_clmul_rewrite branch from 3427b26 to 30a2d5d Compare June 17, 2024 12:03

Larhzu merged commit 30a2d5d into master Jun 17, 2024
8 checks passed

Larhzu deleted the crc_clmul_rewrite branch June 17, 2024 15:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x86 CLMUL CRC rewrite #127

x86 CLMUL CRC rewrite #127

Larhzu commented Jun 11, 2024 •

edited

Loading

x86 CLMUL CRC rewrite #127

x86 CLMUL CRC rewrite #127

Conversation

Larhzu commented Jun 11, 2024 • edited Loading

Larhzu commented Jun 11, 2024 •

edited

Loading